Strong consistency of MLE for finite mixtures of 
location-scale distributions when the scale parameters 

are exponentially small 

Kentaro Tanaka* and Akimichi Takemura^ 

Abstract 

In a finite mixture of location-scale distributions maximum likelihood estimator 
does not exist because of the unboundedness of the likelihood function when the 
scale parameter of some mixture component approaches zero. In order to study 
the strong consistency of maximum likelihood estimator, we consider the case that 
the scale parameters of the component distributions are restricted from below by 
Cn, where {cn} is a sequence of positive real numbers which tend to zero as the 
sample size n increases. We prove that under mild regularity conditions maximum 
likelihood estimator is strongly consistent if the scale parameters are restricted from 
below by c„ = exp(— n'^), < d < 1. 

Key words and phrases: Mixture distribution, maximum likelihood estimator, consis- 
tency. 

1 Introduction 

In some finite mixture distributions maximum likelihood estimator (MLE) does not exist. 
Let us consider the following example. Denote a normal mixture distribution with M 
components and parameter 9 = (ai, fii, erf, ... , api, fiM, <7m) by 

M 
m=l 



where (m = 1, . . . , M) are nonnegative real numbers that sum to one and (f>rn{x; fim, o". 
are normal densities. Let Xi, . . . ,Xn denote a random sample of size n > 2 from the density 
f{x;6o). In view of the identifiability problem of mixture distributions discussed below, 
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here is a parameter value designating the true distribution. However for simphcity we 
just say Oq is the true parameter from now on. The log likelihood function is 



M 



, m=l 




If we set fii = Xi, then the likelihood tends to infinity as 0. Thus MLE does not 

exist. 

But when we restrict 0"^ > c (m = 1, . . . , M) by some positive real constant c, we can 
avoid the divergence of the likelihood. Furthermore, in this situation, it can be shown 
that MLE is strongly consistent if the true parameter 9q is in the restricted parameter 
space. 

On the other hand, the smaller al is, the less contribution = Xi,(tJ) makes 

to the likelihood at other observations X2, . . . ,a;„. Therefore an interesting question here 
is whether we can decrease the bound c = c„ to zero with the sample size n and yet 
guarantee the strong consistency of MLE. If this is possible, the further question is how 
fast c„ can decrease to zero. 

This question is similar to the (so far open) problem stated in Hathawav ( W8^, which 



treats mixtures of normal distributions with constraints imposed on the ratios of variances 
while o ur restriction is imposed on variances themselves. See also a discussion in section 
3.8.1 of lMcLachlan and Peell ()2nnnl ). 



In the above example, the normality of the component distributions is not essential 
and the same difficulty exists for finite mixtures of general location-scale distributions 
such as mixtures of uniform distributions. Furthermore in this paper we allow that each 
component belongs to different location-scale families. Let a^n {fn = 1,...,M) denote 
the scale parameters of the component distributions and consider the restriction am > 
Cn {m = 1, . . . , M). Then a question of interest here is whether we can decrease the bound 
Cn to zero. 

For the case of mixture of uniform distributions, in iTanaka and Takemura we 



proved that MLE is strongly consistent if c„ = exp(— ra'^), < c? < 1. Here d can be 
arbitrarily close to 1 but fixed. In this paper, we prove that the same result holds for 
general finite mixtures of location-scale distributions under very mi ld regularitv conditions 



(assum ptions HHH below) . We employ the same line of proof as in iTanaka and Takemura 



3ut the proof for the general finite mixture is much more difficult. As discussed 



in section |31 the normal density satisfies the regularity conditions and our result implies 
that MLE is strongly consistent for the finite normal mixture if am ^ Cn = exp(— n'^), 
< d < 1, m = 1, . . . ,M. 



Our framework is closely related to the method of sieve (jGrenanderl ()l98lh ). In the 
sieve method an objective function is maximized over a constrained subspace of parameter 
space and then this subspace is expanded to the whole parameter space as the sample 
size increases. Sonae app lications and consistency results for the method are given in 
Geman and Hwang ()l982[ ) . MLE based on a sieve is called a si eve MLE. The convergence 



rates of sieve MLE for Gaussian mixture problems are studied in lGenovese and Wasserman 
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and iGhosal and van der VaartI (|200lh and their ideas are very interesting. They 
obtain the convergence rates by bounding the Helhnger bracketing entropy of subsets of 
the function space and assume that the corresponding subsets of the parameter space are 
compact so that their bracketing entropy does not diverge. In the case of sieve MLE, the 
approximating subspaces are usually taken to be compact, whereas we treat a sequence 
of non-compact subsets of the parameter space expanding to the whole parameter space 
as the sample size increases. Therefore results on sieve MLE are not directly applicable 
in our framework. 

The organization of the paper is as follows. In section El we summarize some prelim- 
inary descriptions. In section IHl we state our main results in theorems Q and El Section 
m is devoted to the proof of theorems and lemmas. Finally in section El we give some 
discussions. 



2 Preliminaries on strong consistency and identifia- 
bility of mixture distributions 

A mixture of M densities with parameter 6 = {ai, fj,i, cxi, . . . , aM, f^M, c'm) is defined by 

M 
m=l 

where a^n, m = 1, . . . , M, called the mixing weights, are nonnegative real numbers that 
sum to one and fm{x; fim, o'm), called the components of the mixture, are density functions. 
In this paper we consider the case that the component densities are location-scale densities 
with the location parameter fim ^ and the scale parameter am > 0, i.e. 

fm{x;fim,<^m) = —fm(- ^;0, 1). (1) 

O'm \ CTm / 

As mentioned above, we allow fm{x; fim, o"m) to belong to different families. For example, 
/ii, (Ji) may be a normal density, f2{.x; jj,2, 0^2) may be a uniform density, etc. Let 
Vim = M X (0, 00) denote the parameter space of the m-th component (/im, c^) and let G 
denote the entire parameter space: 

M M 

e = {(ai, . . . , OA/) G M^' I J] a„ = 1 , a„ > 0} X JJ 

m=l m=l 

Let be a subset of {1,2,..., M} and let \^\ denote the number of elements in 
J(f . Denote by 9^ a subvector of 6' G O consisting of the components in J^. Then the 
parameter space of subprobability measures consisting of the components in J(f is 

Qj, = {9,^ I G e, ^ a„ < 1}. (2) 
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Corresponding density and the set of subprobability densities are denoted by 



(3) 
(4) 



Furthermore denote the set of subprobabihty densities with no more than K components 
by 

U (l<i^<M). (5) 

\je'\<K 

We now briefly discuss identifiabihty of parameters. In mixture models, different pa- 
rameters may designate the same distribution. When the component densities belong to 
a common location-scale family, we can permute the labels of the components and the 
distribution remains the same. A mixture model of i^' — 1 components can be obtained by 
setting one weight = (with arbitrary /i^ and am) in a model with K components. 
These are trivial cases of unidentifiability of parameters. However there are more com- 
phcated cases. Let U{x;a,b) denote the uniform density on the interval [a,b]. Then, for 
example, (x; —1, 1) -|- |?7 (x; —2, 2) and ^U{x; —2, 1) + |f/(a;; —1, 2) represent the same 
distribution ( Everitt and Haridl ( 198l[ l). In this case the limiting behavior of MLE is not 
obvious, although the estimated density should be consistent. Therefore we first give a 
definition of consistency in terms of the estimated density. 

Let /o(a;) = f{x]9o) denote the true density and let fn{x) = f{x;9n) denote the 
estimated density. 

Definition 1. An estimator fn is strongly consistent if 



Prob ( lim 



fn — /o 



where 



is the Li-norm. 



Although definition n is conceptually simple, in order to prove the strong consistency 
of MLE we work with the location and the scale parameters in and the mixing weights. 
In order to deal with the identifiabihty problem let us introduce a distance between two 
sets of parameters. Let dist(^^,^^') denote the ordinary Euclidean distance (or any other 
equivalent distance) between two parameter vectors 9, 9' G O. For U,V G Q define 



dist(f/,y) = inf inf dist(9,9'). 



For a parameter 9, let 



0(9) = {9' ee\ fix; 9') = fix; 9) Vx}. 

Then Gq = 6(6*0) denotes the set of true parameters. Since our densities are con - 
tinuous with respect to 9, by Scheffe's theorem (Theorem 16.12 of iBillingslevi (jl99 



dist(e(^„). Go) ^ implies 



fn — fo 



0. 
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3 Main results 

We assume the following regularity conditions for strong consistency of MLE. 
Assumption 1. There exist real constants Vq,Vi > and j3 > 1 such that 
fm{x; lirn = 0, dm = 1) < imn{vQ , Vi ■ \x\'^} 

for all m. 

This assumption means that fm [m = 1, . . . , M) are bounded and their tails decrease 
to zero faster than or equal to which is a very mild condition. 

The following three regularity conditions are standard conditions assumed in discussing 
strong consistency of MLE. Let F denote any compact subset of G. 

Assumption 2. For 9 E Q and any positive real number p, let 

f{x;e,p) = sup fix;e'). 

dist{e',e)<p 

For each 6 eT and sufficiently small p, f{x; 6,p) is measurable. 

Assumption 3. For each B eT, if lim^-^oo ^^^'^ = 0, {9^^^ E T) then Wm^^^ f [x] 6^^'^) = 
/(x; 9) except on a set which is a null set and does not depend on the sequence {9^^^}'^^. 

Assumption 4. 

J |log/(a;; 6'o)| /(x; 6'o)dx < oo. 

Let Eq[-] denote the expectation under the true parameter 9q. The following theorem 
is essential to our argument and it is of some independent interest. 

Theorem 1. Suppose that assumptions are satisfied and fo E ^m\Sm-\ where 
and ^M-i cf^e defined in Then there exist real constants k, A > such that 

Eo [log{g{x) + k}] + X< Eo[logf{x; 9o)] (6) 

for allg E ^m-\- 

We now state the main theorem of this paper. 

Theorem 2. Suppose that assumptions\^^ are satisfied and /o E ^m\^m~i where 
and ^M-i are defined in (0). Let cq > and < d < 1. If Cn = cq ■ exp(— n'^) and 

en = {9 Ee\arr,>Cn, (m = 1, . . . , M) } , 

then 

Prob ( lim dist(e(^„), Go) = o) = 1 , 
where 9n is MLE restricted to G„. 
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As remarked at the end of the previous section theorem |21 imp hes the following corol- 
lary. 

Corollary 1. Under the same assumptions of theorem\^ fn is strongly consistent in the 
sense of definition{J[ 

4 Proofs 

In this section, we prove theorems stated in section El The organization of this section 
is as follows. First in subsection 14.11 we state some lemmas for theorem ^ and El Next 
in subsection 14.21 we prove theorem ^ which is also essential for theorem |21 Finally we 
prove theorem |21 in subsection 14.31 For convenience a list of notations used in our proofs 
is provided at the end of this paper. 

4.1 Notations and some lemmas 

Fix arbitrary kq > 0, which corresponds to n in theorem ^ Define (3 and y > 0, as 



(7) 



where Vi and j3 are given in assumption ^ Noting that Vi ■ {^{y)) ^ = no/y, the following 
lemma is easily proved and we omit its proof. See figure ^ 

Lemma 1. Under the assumptionU^ for arbitrary Kq > each component density fm{x', /i, cr) 
is bounded by a step function 

fm{x; fi,(y) < max{l[^_,,(^)^^+,,(^))(a;) ■ , kq} < l[f^-u{a),fi+u{a)){x) ■ + kq, 

where lu{x) denotes the indicator function of U C M. 





cr 










— /„^(j;;0,cr) 






















a) 



Figure 1: Each component is bounded by step function. 
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From lemma H 

M M 

"^Olmfmix; firn,(ym) < ^ '^l^lm-u{a^),^^m+u(a,r^))i^) ^ /^O- (8) 

m=l m=l 

The right-hand side of (jH)) is a step function. We look at this step function where the 
density f{x; 9) is high, i.e. the scale parameter of some component is small. 
For a given choice of > 0, choose cq > such that 

Ko(M + 1) 

Below we will impose additional constraints on and cq to make kq and cq sufficiently 
small to satisfy other conditions. For each 6', let 

'Xt<co = ^<co(^) = {m\l <m< M , crm<Co} 

denote the set of components with am < cq and define 

Ji0)= U [fJ-m- ^i(^m),IJ"m + l^icTm))- (10) 

On J (9) the density f{x; 9) is high. Now dividing J {9) according to the height of the step 
function on the right-hand side of (jH)), for x G J {9) we can write the right-hand side of 
© as 

r M \ T(e) 

U=i ^™ J t=i 

where Jt{9) (t = 1, . . ■ ,T{9)) are disjoint intervals, [iJm-i^{<ym), f^m + i^{<ym)) {m G ^<co) 
are unions of some of Jt{9ys and the height H{Jt{9)) for each t is defined by any x G ^4(6*) 
as 

A/ 

H{Jt{9)) = ll^.„,-u(a,^),^,„,+u(a,^)){x) ■ + Ko- (11) 

m=l 

See figure 121 For x G ^4(6*), there is at least one m = rrit such that x G [/im — ^{<^m), fJ'm + 
z/(crm)) and H{Jt{9)) > vo/cq + kq. Also note that the total number T{9) of Jt(6')'s satisfies 
T(^^) < 2M, because the change of the height can only occur at fim — ^icTm) or /im + i^(o"m)- 
By (jHI) we have the following lemma for x G J{9). 

Lemma 2. Under the assumption^ for each x G J{9) 

M T(e) 

"Y amfm{x; fJ^m, CTrn) < YH{Jt{9)) ■ lj^^e){x). 
m=l t=l 
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H{j2m 



H{Ji{e)) 



H{jm) 



HiMO)) 



CO 



. . H ^ \ 

Figure 2: Definition of Jt{0). 



A density can be liigli only in a small region and we want to have some explicit bound 
on the length W{Jt{e)) of Jt{9) in terms of its height H{Jt{e)). Let 

^;2 = 2f^V(t;o-(M + l)f , ^(y) ^ ■ ( -] , y > , (12) 



where Vq, Vi and /? are given in assumption [T] and /3 is defined in ((Tj). 

Lemma 3. Under the assumptionUi the length W{Jt{6)) of Jt{6) for each t is bounded 
as 

Proof: In ffTTjl at any x G Jt{0), H{Jt{9)) — kq consists of at most M components. Thus, 
for each Jt{0), there exists at least one component m = mt such that 

— > ^ • mm) - ■ 



Furthermore from Q we have 



co(M+l) 

H{Jt{e)) _ M-H{Jt{e)) 

' M + l M+l 
8 



Therefore we have 

< f^A^ ( MM+i) y 

This with W{Jt{6)) < 2v{am) proves the lemma. □ 

So far we have been concerned with bounding the density at its peaks. Now we 
consider bounding the tail of the true density /(a;; 6*0). Write Jiq = maxdyUoil, • • • , |/^oa/|) 
and 6^0 = (aoi, /ioi, ctqi, • • • , aoAf , A^oa/, (^om)- Let 

M 

Uo = sup f{x; 9o) , Ml = max(Mo ■ (2/io)^, ^ aom(^o^^) ■ (13) 

m=l 

Lemma 4. Under the assumption^ the following inequality holds. 

fix; 9q) < min {uq, ui ■ \x\~^}, Va; G M. 
Proof: From assumption ^ 



Af 



^^0 



m=l 

Then for |a;| > 2yUo 

k - fiom\-^ < (kl - /io)'^ < 2^|x|-^ , (m = 1, . . . , M). 
Therefore for |x| > 2/2o 

M 
m,=l 

and 

/(a;; 6'o) < min {uq, ui ■ Ix]"^}, Vx G M. 



□ 

Based on lemma 0] we can bound the behavior of the minimum and the maximum of 
the sample. Let Xi, . . . , x„ denote a random sample of size n from f{x] 6q) and let 

= min{xi, . . . , = max {xi, . . . , 

The following lemma follows from the Borel-Cantelli lemma. 

Lemma 5. For any real constant Aq > and C > 0, define 

Then 

Prob {Xn,l < -An or Xn,n > An i.o.) = . 
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Proof: By the Borel-Cantelli lemma and the Bonferroni inequahty it suffices to show 
that 

oo oo 

Prob {Xn,l < -An) < OO, ^ Prob {Xn,n > An) < OO. 



n=l 



n=l 



We consider the left tail. Let Fq(x) denotes the distribution function of f{x; 9o). Then 

Froh{xn,i< -An) = l-(l-Fo(-A„)r, 

and 



Fo(-A 



n) < / 



Ui ■ \x\ 



By replacing n by n — Uq with a sufficiently large no if necessary, we can assume without 
loss of generahty that 



< 1, Vn. 



Then 



log(l - Fo(-A„)r > log(^l 
= logfl 



■ A-J^^ 



(3-1 " 

UiAq 



Let = uiAq^^^ / {P — 1) and we have 

iiog(i-Fo(-Aori < 



log 1 - 



U2 



n 



2+C 



W2 



i+C 



log 1 - 



U2 



2+C 



Hence there exists a sufficiently large N and > such that 



iiog(i-Fo(-A„)ri < 



U3_ 



for all n> N. This and (1 - Fo(-A„))" < 1 imply that forn > 



log(l-Fo(->l„)r > 



Ms 



Hence by 1 — e ^ < y, we have for n > N 

Prob(x„,i < -An) = 1 - (1 - Foi-An))"" < 1 - exp 
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< 



n 2 / n 2 



c ■ 



Therefore we obtain 

V Prob(a;„,i < = V 1 - (1 - F,{-A^)Y < V ^ < oo. 

T) 2 

n>A'' n>W n 

The case of the right tail Prob(x„,„ > An) is also proved by the same argument. □ 
Finally we consider subprobability densities in For any positive real number p, 

let 

fA^^o.^.p) = sup fA^^e'A , ie:^eeA- (i4) 

dist(9;^,9j^)<p 

The following lemma follows from the bounded convergence theorem. 

Lemma 6. Let r,jf denote any compact subset ofQ^x- For any real constant k > and 
any point 6',x £ the following equality holds under the assumptionUl and\^ 

limEo[log{/^(x; 9.r,p) + ^}] = Eo[\og{fjr{x; 9x) + k}] . 

Proof: We tr eat the case of k > 0. The case of k = is almost the same as the proof 
of Lemma 2 in Waldl ( 1949t ). From assumption El we have 



limlog{/^(a;; 6'^,p) + k} = log{/jr(a;; 9,^ + k} a.e. 

Now Fjf is compact and k > 0. Hence by assumption[TJ log{/jf (x; 6'jf , p) + K,} is bounded. 
Therefore 

lim Eo[log{/,x(x; 9,je, p) + k}] = Eo[log{/^(x; 9^ + t^}] 
by the bounded convergence theorem. □ 



4.2 Proof of theorem [T] 

In this section we prove theorem ^ by contradiction. Fix arbitrary proper subset =Sf of 
{1, . . . , M}. It suffices to prove that (jH)) holds for all g G Suppose that © does not 
hold for some Then for any A, /t > 0, there exists g G such that 

£;o[log{^(x) + «:}] + A > ^o[log/(x; ^^o)]- 

Here, let {Xj}, {nj} be positive sequences which decrease to zero. Then for each Aj, kj > 0, 
there exists gj G such that 



Eo[log {gjix) + Kj}] + Xj > Eo[\og fix; 9o)] 
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It follows that 

liminf Eo[log {gj{x) + K,}] + A,- > ^o[log/(a;; ^o)] • (15) 
Now Qj can be written as 

Then the following lemma holds by compactification argument. 

Lemma 7. There exists a subsequence of {6^y}°Zi = {{am,fim,am \ rn G =5f}}°^i and 
disjoint subsets Ji^aio, ^aioo, ^iniioo C. =Sf such that along the subsequence: 

^ia ^0 for me XrlO, 
a^l^ ^ oo for me ^too, 

a.^-* converges to a finite value and l/i^-*! oo for m G ^^|too, 

(a^-', /i^-*, cr^-') converges to a finite point (a[^\ /x^-*, cr^-*) form G J^r, 

where Xr = J^\{J^aio U =Xtoo U =^^|too}- 
Proof: Let 

/^'m = arctan (/x^^^ ) , (x'J^^ = arctan (ct^) ) . 

Then {{am^ /i'H'', cr'^^ | m G =Sf}}°^^ is regarded as a sequence in the following compact 
set. 

L 

< a„ < 1 , < 1 , < /i'H^ < ^ ' < < ^- (16) 

m=l 

Therefore there exists a subsequence of {{am\ cr'H^ I ^ -^llj^i that converges to 
a point in the set (fTBj) . Now we sort the elements in according to their behaviors in this 
subsequence. First, we choose components such that am 0, and add these m to J(^aio- 
Second, from the remainder, we choose components such that am oo, and add these 
m to J(fatoD- Third, from the remainder, we choose components such that \fim\ — * oo and 
add these m to ^^||oo- Finally, we choose the remaining components as □ 

From lemma 13 we define goo as follows. 

goo{x) = ^ ^ tt^ ^fm{Xi^[y^ \ '^m '') ^ ■ 

For notational simplicity and without loss of generality, we replace the original sequence 
with this subsequence, because (fTHjl holds for this subsequence as well. Furthermore, by 
considering the sequence {O'^^}'^^^ where jo is sufficiently large and replacing j by j — jo 

12 



if necessary, we can assume without loss of generality that there exist sufficiently small 
real constants Kq > and Cq > such that 

Eo[\ogf{x; 9o)] - Eo [\og{goo{x) + Skq}] > , ft:o < 



Co(M+l) 



< Co {me JCio) , CT^^ > — {me XtooJ , 
Co < cr^^ < — ("^ e ^^itoo) for all j . (17) 



From lemma Q and El we have 



< j l^(,u))(x) ■ log <^ H{Jt{e%^)) ■ + \f{x; ^o)dx 



where =J^>o = ^\<J^aiQ- 

Now we evaluate the first term on the right-hand side of (|T8|l . From lemma El 

y l^(,(.))(x)-log<^ 5^ /7(Jt(^^^)))-l,^(,U))(x) + S-^/(x;eo)dx 



(1^ 



< 5^ W^(Jt(4)))-log|iJ(Ji(4))) + /t,}-Mo-^0 , (n-^oo), (19) 
t=i 

where uq = sup^ f{x;9o) defined in (fT!?|) . Next we evaluate the second term on the right- 
hand side of jlHI). Let 



min {mm{\fj^ + u{a'£)\,\f4i^-u{a'£)\}}. (20) 

Then 

/^.To.(^; (^.J + /%iTo.(^; < '^o for X e [-A^^\ A(^-)]\J(4)) . 
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Therefore the following inequality holds 



< j l[_AO),^o)]\j(e^))(^) ■ log + 2«:o + /tij/la;; ^o)dx 

+ y ^{(-oo -A(i))U(A(i),oo)}\J(e;^')(^) ■ log {/-^M|Too(^' ^SJmIToc) 

^ + (say). (21) 

By the bounded convergence theorem, we obtain 

^ /■ \og{g^{x) + 2«:o}/(x; eo)dx , 1^'^ ^ 0. (22) 



From (HHI), (Uni), (EH), (1221), we have 
Eo[log/(a;; ^o)] < limsup Eo[log{^j(x) + kj}] + Xj < Eq [log{^oo(a;) + ^kq}] . 

This is a contradiction to (fT7|) . This completes the proof of theorem 

4.3 Proof of the main theorem 

We choose real constants k and A to satisfy (jH)) by using theo rem [H Having chosen 



these constants, from now on we follow the line of the proof in iTanaka and Takemura 
(12005 ^. although the details of the proof here is much more complicated. For the sake of 
readability we divide our proof into further sections. 

4.3.1 Setting up constants 

For K, X satisfying (0), let kq, Xq be real constants such that 

<Akq< K , < 4Ao < A . 
Note that 4ko,4Ao also satisfy (0). Define 

B = — > max {(Joi, . . . , ctom}- (23) 

If > B, then the density of the m-th component is almost flat and makes little 
contribution to the likelihood. In section ll.3.21 we partition the parameter space according 
to this property. 
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Because {c„} is decreasing to zero, by replacing cq by some c„ if necessary, we can 
assume without loss of generality that Cq is sufficiently small to satisfy the following 
conditions, 

{vo/cof > e, 

Co < min{croi, . . .,croAi}, 

3M ■ uq ■ 2z/(co) ■ llogKol < Ao, 

3 ■ 2M ■ Mo ■ Hvq/co) ■ log(t;o/co) < Ao , 

where /?, and ^(■) are defined in ((7j) and (fT^. 

For any subset C M, let PoiV) denote the probability of V under the true density 

Po{V)= [ f{x;eo)dx. (25) 
Jv 



PoK)-log( "°/^° + '"° ')<Ao, (26) 



Let y4o > be a positive constant which satisfies 

Co + 

3ko 
where 

^/o = (-oo,-Ao]U[Ao,oo). (27) 
Let An = Aq ■ ni^-'^ as in lemma El Define a subset of G„ in theorem |21 by 

Q'n = {0 e Qn \ 3m S.t. Cn<<7m< Cq Or > Aq} C 9„ , 

and let 

Fo = {^^ G I Co < < 5 , < Ao, (m = 1, . . . , M) } . 
Note that Oq C Fo, where 6o is the set of true parameters. 

4.3.2 Partitioning the parameter space 

In view of theorems in IWaO (jl949( ) , iR.edner ( 1981 ) , for the strong consistency of MLE on 



9„ under assumption [H |21 El and lU it suffices to prove that 

supegrue; nr=i /(^i! ^) . 
hm ™ — — = 0, a.e. 

lL=i/(a;i;^o) 

for all closed F C Fo not intersecting Gq. Note that for all F and {xi}^^^, 

n ( n n 

sup TT/(xi;6') = max<( supTT/(xi;6') , supTT/(xi;6 
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Furthermore 



hm ^^^P^I^Dii^^ = 0, a.e. 



holds by theorems in Waldl ( 1949[ ) , iRedneil ()l98]t ) . Therefore it suffices to prove 

sup0ge;nr=i/(a;i;^) „ 

hm ™ — - — — = 0, a.e. 

Note that in the argument above the supremum of the likelihood function over F U 
is considered separately for F and 9^. F and 6^ form a covering of F U 9^. In our proof, 
we consider finer and finer finite coverings of 9^. As above, it suffices to prove that the 
ratio of the supremum of the likelihood over each member of the covering to the likelihood 
at 9q converges to zero almost everywhere. 

Let 6 G 9^. Let J^a<co, '^u>Bi '^\^l\>Ao represent disjoint subsets of {1,...,M} and 
define 

= {1, . . . , M}\{j^<co U J^a>B U tJ^^|>^(j}. 

For any given ^<co, '^a>B-, '^\fi\>Ao^ we define a subset of 9^ by 

= e o-m < Co, (m G JC<co) ; (^m> B,{m e ^>b) ; 

Co < am < B, > Aq, {m G J^\f,\>Ao) 

Co < < B, l/i^l < Aq, (m G ^)}. (28) 

As above, it suffices to prove that for each choice of disjoint subsets J^o-<coy '^u>b^ '^mI>^o' 

hm "n — ^-T = 0, a.e. 

We fix ^<co, ^>B, ^/.|>Ao, from now on. 

Next we consider coverings of Qxn- Recall that Q.x, fjf{x; 9jf) and f._r{x; 9x',p) are 
defined in (j2I), © and (fT^ . The following lemma follows from lemma Inland compactness 

of Q^Tr- 

Lemma 8. Let ^{6,p{6)) denote the open ball with center 9 and radius p{0). Then 9.Xr 
can he covered by a finite number of balls Jl^{6^^^, p{6^^^)) , . . . , ^{O^^j^, p{6^^jj) such that 

Eo[log{f,^,{x; e%,p{e%)) + Ko}] + Ao < Eo[\ogf{x- O^)] , (s = 1, . . . , 5) . 
Proof: From lemma El we have 

lini Eo [log {/^^(x; Oxr, p) + ^o}] = Eq [log {f.^^{x] Oxa) + /to}] • 

For each O^r G 9.^^ 

Eq [log {fxR{x; O.Xr) + /to}] + Ao < ^o[log /(x; ^o)] 
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holds. Therefore for each 9^j(f^^ G ©jr^) there exists a radius p{O^Xe) > such that 
^o[log{/jr«(x; e,^^,p{d,^^)) + Ko}] + Aq < Eo[log/(x; ^o)]- 

Since 

and the compactness of ©,Xr, there exists a finite number of balls e^(^^|^^, p(6'^|^)), . . . , 
^(^2>P(^S)) which cover e^,. ' ' □ 

Based on lemma |H1 we partition Q'^jc- Recall that we denote by 9.^ the subvector of 
6 E Q consisting of the components in J(^ . Define a subset of Q'^ by 

©Us = e e;,,^ 1 9,^, e ^{0% p(^ii))}. (29) 

Then 9^^^ is covered by G^,^^ . . . , Q'^^,^^s ■ 

s 

Again it suffices to prove that for each choice of Jfa<co, »^>b, ^/i|>Ao5 -5 

hm "n = 0, a.e. (30) 

We fix =J^<co, '^>_B, '^mI>^o ^^'^ from now on. Because 

1 " 

lim - V'log/(xi;6'o) = ^o[log/(x;6'o)], a.e. 

(lHn|) is implied by 

limsup- sup y'log/(xi;6') < £'o[log/(x;6'o)], a.e. (31) 
Therefore it suffices to prove (jHTj) . which is a new intermediate goal of our proof hereafter. 



n . 
1=1 



4.3.3 Bounding the likelihood by four terms 

In this section we bound the likelihood function by four terms depending on the positions 
of the observations xi, . . . , Xn- Let -Rn(V^) denote the number of observations which belong 
to a set 1/ C M. 
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Lemma 9. For 6 G 6^ ^^ 



n 1 " 



n — ' n 

i=l i=l 



+ -Rn{£^0) ■ log 



n \ 6Kq 

+ -i?„(J(e))-(-log/€o) + - V \ogf{x,-e). (32) 



n n 



Proof: Let =X>co = {1, • • • , M}\^<co and J(fco<a<B = {1, • • • , M}\{^<co U JC>b}- 
For X ^ 7(6*), /(x; 6^) < 6',jr„>,J + holds. Therefore 



1 " 1 1 

-5^1og/(a;,;^^) < - Yl ^ogf{x,;9) + - ^ hg{U,Jx;9,j,^^J + Ko} 

1 " 

- Y log {/^.>co(^; ^=^.>co) + «^o} 

1 

+ - 5^ [log/(x,;^)-log{/,^,,,^(x;^..;r.>.J + /^o}](33) 



n 

i=l 



n 

xi&J{e) 



Consider the second term on the right-hand side. We have 

i Y [iog/(^i;^)-iog{/.^.>co(^;^-^.>.o) + ^o}] 

< - Y iogf{x,;d)--R4J{e))-log^o 



n 

x^eJie) 



n — ' n 

xi£j{e) 



This takes care of the third and the fourth term of ()32j) . 

Now consider the first term on the right-hand side of ()33p. Note that 

^ n 1 " 

- ^ log + Ko} < - XI {/-^co<.<B(a;i; ^.x,o<.<b) + 2fi;o} 

For X ^ 
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Therefore we obtain 



1 

- ^log + 

" i=i 

1 " 

= - X] log { (xi ; e,^^ ) + 3/to } 

i=l 

+ ~ 5Z [log {/•^co<.<s(3^i^ ^.^co<.<s) + 2/«o} - log {fxniXi] 9,xJ + Skq}] 

Xi&£/o 

(34) 

Note that /^,„«,<B(a;; 6'^,„«,<b) < vq/cq from lemma [B Therefore 

The r.h.s of dSl 

1 " 1 
- n ^ {fjf^Ri^i' ^-^r) + 3«;o} + ~ 5Z [1°S {^^o/co + 2ko} - logS/to] 

i=l XiSM) 

< ^ E log p(e.^J) + + ^i?n(^o) ■ log (-'^''' + ^""^ 

i=l ^ 



3^0 

This takes care of the first and the second term of (13211 . □ 



From lemma |H1 and the strong law of large numbers the first term on the right hand 
side of ()32p converges to the expectation of a density which has less than M components 
and the expectation is less than that of the true density by theorem ^ The second term 
converges to a small value because the relative frequency on ^ is very small. The third 
term also converges to small value because the relative frequency on J{6) is very small. 
The fourth term is somewhat complicated. The component in J(fa<co have high peaks. 
However the widths of the peaks are very narrow and the relative frequency on the interval 
is very small. Hence the fourth term makes little contribution to the likelihood. Therefore 
the mean log likelihood (the left hand side of (jHSjl) converges to a value which is less than 
that of the true density. In the following we consider the details. 

The first term and the second term are easy. 

The first term: By lemma |H1 and the strong law of large numbers we have 
1 " 

lim - Vlog{/^^(xi; 9,_r^, p{9jrj) + 4/to} < Eo[log/(x; ^o)] - 4Ao, a.e. (35) 

n— ►oo 77, ' 

i=l 
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The second term: By ()26|) and the strong law of large numbers we have 



lim -Rn{j.^o) 



log I 1 < Ao, a.e. 



(36) 



Note that we have — 4Ao from the first term and Aq from the second term. In the rest 
of our proof we show that both the third term and the fourth term can be bounded by 
Ao. 

4.3.4 Bounding the third term 

The third term can be bounded by dividing the interval [— ^n] into short intervals of 
length 2z/(co). 

Lemma 10. 



limsup sup — -R„( 7(6*)) < 3M ■ Mo • 2i^(co), a.e. 
Proof: Let e > be arbitrarily fixed and let Jq"'' = [— A„, A.^]. We divide Jq""* from —A^, 



-Ar. 



k 



Co) 



Co) 



Ar. 



Figure 3: Division of Jq"^ by short intervals of length 2z/(co) 



to An by short intervals of length 2z/(co). In right end of the intervals of J^^\ overlap 
of two short intervals of length 2i/(co) is allowed and the right end of a short interval 
coincides with the right end of Jq""*. See Figure El Let fcn(co) be the number of short 



intervals and let l["\co), ■ ■ ■ , /^"l^, -.(cq) be the divided short intervals. Then we have 



r(") 



fcn(Co) < ^"^^ 



2z/(co, 



+ 1 



Ar, 



+ 1 



Aq ■ 



+ 1 . 



(37) 



Note that any interval in Jq"^ of length 2z/(co) is covered by at most 3 small intervals from 



{/{"-•(co), . . . ,/^"J^^)(co)}. Now consider 7(6*) = Um=i[/"m ~ '^{^m),fJ'm + J^{(^m))- Since 



rin) 



K 
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[/im — ^(o"™), /^m + ^{o'm)) , {m = 1 , . . . , K) , are intervals of length less than or equal to 
2i/(co), J{6) is covered by at most 3M short intervals. Then 

sup -Rn{Ji0)) -3M ■uo-2u{co) > e 
or 

{1 < 3A: < K{co) , -i?„(4(co)) - Mo ■ 2z/(co) > . (38) 

n AM 

By lemma El Xln -P^°t)(x„,i < — y4„ or Xn,n > ^n) < cxo and the first event on the right- 
hand side of ()38|) can be ignored. We only need to consider the second event. We will use 
the same logic in the proofs of lemmas El and ^1 below. Then 

Prob I sup -Rn{J{9)) -3M ■uo-2u{co) > e 



< V Prob ( -i?„(4(co)) - Mo • 2z/(co) > 

k=l ^ 



3M 



Recall that, for any set C M, we denote by PoiV) the probability of V under the true 
density in ()25|1 and denote by Rn the number of observations which belong to V as in 
lemma ini Since 

^^0(4(00)) < Mo ■ 2z/(co), (fc = 1, . . . , kn{e)), 

RfiiV) ~ Bin(?T,, Po(^)) and from Okamoto's inequality ( Okamotol ( 1958h ). we obtain 
Prob ( -i?„(/fc(co)) - Mq ■ 2u{co) > ^ 



< exp 
Therefore from ()37p 



< Prob (-Rnihico)) - Po(4(co)) > 77^ 

V n oM 
2ne 



9M2 



Prob I sup -Rn{J{0)) - 3M ■ uq ■ 2u{co) > e 



2+C 

, Ar, ■ nf'-^ \ ( 2ne^ 
< . , + 1 ■ exp 



z/(co) J ^ V 9M2 

When we sum this over n, the resulting series on the right converges. Hence by the 
Borel-Cantelli lemma, we have 

Prob f sup -i?„(J(0)) -3M-Mo-2z/(co) > e i.o. ] = 0. 
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Because e > was arbitrary, we obtain 



limsup sup —Rn{J{0))<3M-uo-2i'{co), a.e. 

' n.J^, s 



By this lemma and (j211) we have 



hmsup sup —Rn{J{6)) ■ (— log^o) < 3M ■ uq ■ 2z/(co) ■ |log/€o| < Aq a.e. 

n^oo 6»e0' ^ ^ 

This bounds the third term on the right-hand side of ()32p from above. 



□ 



(39) 



4.3.5 Bonding the fourth term 

Finally we bound the fourth term on the right-hand side of ()32|) from above. From lemma|21 
we have 



M 



Tie) 



lj(9)(a;) ■ ^ am/m(a;;/Um,cr^) < ^i^(Jt(6')) ■ lj,(0)(x) , {x e J{9)) (40) 

m=l t=l 

We now classify the intervals Jt{0), t = 1, . . . ,T{9), by the height H{Jt{6)). Let 

4 = Co ■ exp (-n^/^) (41) 

and define Tn{0) and t'^{9) by 

= {t g {1, . . . , T{e)} I H{ue)) < Mv,/c'^} , r'^{e) = {1, . . . , ne)}\T^{e) . (42) 

See Figure m 

Now suppose that the following inequality holds. 



lim sup sup 

n— >oo ee©' 



T(0) 

y^-RM{0))\ogH{Ud)) 
t=i 



-3\ J2 uo-aH{Jt{0)))-iogH{Me))+ J2 -^ogHiMe)) 

I . . . . ^ 



< 0, a.e. 



(43) 



From (I2H), and noting that logy/y^ is decreasing in > e, we have 

3- J2 Uo-miJtm)-'^ogH{M9)) < 3 ■ 2M ■ Mo ■ e(Vco) ■ log(Vco) < Ao, 

3- y -\ogH(Jt(e)) < 3-2M---log^^0. (44) 
^-^ n n Cn 
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c' 














2 + 1^0 

CO 






Jl 


{0) J2 


0) 









Figure 4: Example of classification of the intervals by the height (r„(6') = {1,4} , t'^{6) = 
{2,3}). 



Then from (jlUj) . PHj) and (jSI), the fourth term on the right-hand side of (jH^ is bounded 
from above as 



limsup— sup log /(xj; 6*) < Ao a.e. 

Combining (jS21), dSHI), dSHl), dSHI) and (gSI) we obtain 



(45) 



1 " 

limsup sup -^log/(xi;6') < Eo[log f{x;6o)] - Xq, a.e. 



n->oo 6»ee' i 1 



and (p?T|) is satisfied. Therefore it suffices to prove ()4H|1 . which is a new goal of our proof. 

We now consider further finite covering of Q'njfs- -^"^^ ^"^J T (1 < T < 2M) and 
r C {1, . . . , T(6')}, define a subset of 0„,,jr,s by 



0U,.,T,r = {^^ e e:,,,^.,, I no) = T , r„(0) = r} . 
Then is derived from the following two lemmas. 
Lemma 11. 



(46) 



lim sup sup 



n,je,s,T,T 



.ter' ter' 



< a.e. 



where r' = {1, . . . , T}\t. 
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Proof: Let 5 > be any fixed positive real constant and let /ij(^) denote the middle 
point of Jt{0). Here, we consider the probability of the event that 



sup V -Rn{Jtm ■ \ogH{Ue)) -3y^- logH{Jt{e)) 
Since H{Jt{6)) > Mvq/c'^ holds for t G r', we obtain by lemma El 



> 2M5. 



(47) 



W{Jt{d)) < V2 



Mvq 



/3 



f 2 ■ 



Co 

MVn 



exp (— /3 ■ n 



l/4> 



Let 



f3 = 



Co 



Wr. 



E l-expM-n^/^), 



(48) 
(49) 



Noting that for t G r', the length of Jt{d) is less than or equal to 2w„, the following 
relation holds. 



The event occurs. 



sup 



fee' 



w„] - 3 ■ 



■ log 



Mvq 



> 2MS 



Mvo 



^(^^K,Mr, 3tGr' s.t. (^R^[^[{e),Wr,]-3--) ■ log ^ > 5 

e e;^^;^^,^^^,, 3t G r' s.t. R^[fi[{e),Wn] > 6 

sup Rn[fi',Wn] > 6 . 
— oo<^'<oo 



(50) 



Below, we consider the probability of the event that (fSn|) occurs. We divide Jq"'' = 
[— y4„. An] from — A„ to by short intervals of length 2wn as in the proof of lemma ITUl 
Let k{wn) be the number of short intervals and let Ii{wn), ■ ■ ■ , h{w„){wn) be the divided 
short intervals. Then we have 



i-M ^ ^ Aq ■ n^-i 

fc(c„) < t; h 1 = 1—^ \- 1 • 



(51) 



Since any interval in Jq of length 2w„ is covered by at most 3 small intervals from 
h{wn), . . . , h(wn){wn) and from lemma 

sup Rn[fJ'',Wn\>Q => I < < k{Wn) , Rn{h{Wn)) > 2 . 

— 00</i'<00 
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Note that Rn{Ik{wn)) ~ Bin(n, Po(-^fc('W^n))) and Po(-^fc(^n)) ^ 2w„mo. Therefore from 
(jnH) we have 

V Prob (Rnihiwn)) > 2) < ^ + 1 • <^ max Prob(P„(4(w;„)) > 2 

^ \Wn J [l<fc<fe(t«„) 

\ " ^fe f A \ 1 

^ — + M 5^ — (2«;„Mo)' < — + 1 {2nwnu^f V -(2nw;„Mo)' 



< ( — + 1 ) (2nw„'Uo)^ exp {2nWnUo) ■ 



When we sum this over n, resulting series on the right converges. Hence by the Borel- 
Cantelh lemma and the fact that 6 > was arbitrary, we obtain 



lim sup sup 



^-^ n ^-^ n 



< a.e. 



□ 



Lemma 12. 



lim sup sup 



n^oo 6*66 



n.jr ,s,T,T 



y^-RniJtm-^ogHiMe)) 

ter 

-3j2^o-aH{Me)))-\ogH{Me)) 



ter 



< a.e. 



Proof: Let 5 > be any fixed positive real constant and let 

5 r , Mvo 

Since vo/co < H{Jt{e)) < Mvo/c'^, we have e(Mt;o/c„) < aH{Jt{e))) < i{vo/c,). We 
divide the interval [^(Mt>o/c^), (^(vo/cq)] from ^(co/fo) to ^(Mt>o/c^) by short intervals 
of length hn. In the left end ^{Mvq/c'^ of the interval [,^(Mt>o/c^), ^(vo/cq)], overlap of 
two short intervals of length hn is allowed and the left end of a short interval is equal to 
^(Mfo/c^). Let be the number of short intervals of length and define w'f''^ by 

2^(n) ^ {^{Vq/Cq) - {I - l)K, 1 < / < /n, .ggs 
"\e(MVO, l = ln + l. 
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Then we have 



(54) 



Let 



1//3 



{y>o) 



where ^ ^(O is the inverse function of ^(■). Next we consider the probabihty of the event 
that 



dee' 



sup 



,jr,s,T,- 



T-RniJtm-^ogHiMe)) 

^-^ n 



> 2M5. 



For this event the following relation holds. 

The event occurs. 
^ 39 e Q'n,jr,s,T,T, Wter, 1 < 3l{t) < /„ s.t. 



and 



36 G e^^^^.^r^^ , 3t G r, 1 < 3/(t) < /„ s.t. 
1 < 3/ < /„ s.t. 



RnWt{0),wl'li] - 3uo ■ 2«;g;^,j • log^(2«;gJ^J > 5 

31 < In S.t. 

sup f [/x', - 3mo ■ 2^i;gl') ■ log ^^(2^;^) > 5 
< 3/ < S.t. 

sup ( (-RnW, ^;"^] - 3wo ■ 2wlA ■ log ^(2ti;;jj 

+ 3no ■ (2u;{") - 2^;^^) ■ log^(2^/;g)} > 5 



-oo 



(55) 



(56) 
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Then from ()52|) and lemma El the following relation holds. 

The event fjKH|l occurs. 
1 < 3/ < L s.t. 



sup - (Rn[fi', «;;"^] - 3^0 ■ 2w;;")) ■ \og^{2wll\) > 



— oo</^'<oo 

1 < 3/ < L s.t. 



sup - wl""^] - 3uo ■ 2^}")) ■ \og^P{2wil\) > 



-A„<n'<A^ 



(57) 



Below, we consider the probability of the event that (fS7|) occurs. We divide Jo"""* from 



—An to An by short intervals of length as in the proof of lemma ITUl Let k{w^ 

be the number of short intervals and let . . . , /^^^{„)^(w|-"^) be the divided short 



intervals. Then we have 



lAr, 



(58) 



Since any interval in Jo of length 2o"|'"^ is covered by at most 3 small intervals from 



{/i(wj"^), . . . , J . („).(w|"0}> we have 



1 



-An</i'<An \n 



sup ( -Rn Ifi', wl""^] - 3uo ■ 2wr> \ogij{2wi;> 



(n) 



max (^Rnihiwl-^)) - u, ■ 2wt^) >\.^- (log^(2«;(;j^,; 



-1 



■ (59) 



Note that i?„(4(wP)) ~ ^inin. Pai hiwr))) and P nf/t-fwr)) < mq ■ 2«;p. Therefore 
from (jHH|l and Okamoto's inequality (jOkamotol ( 1958[ )) we have 



Prob max 



k=l,...,k{w\ ') 



\ {Rn{h{^t^)) - % ■ 2wt^) >\\ {log^(24';;^,) 



< 



2A 



2w 



< 



(n) 



+ M ■ exp 



-2r2 • 
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1 + M ■ exp 

From (jnH), dini), (EH), dSni), and ((201) we obtain 



-2n-^{log(MV4)}'' 



(60) 



V Prob { sup - w;?"^] - 3mo ■ 2«;f")) ■ log V^(2«;fjj > ^) 



Ar, 



— + M -exp 



-2n-^{log(MVO}-' 
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When we sum this over n, the resulting series on the right converges. Hence by the 
Borel-CanteUi lemma and the fact that 5 > is arbitrary, we have 



lim sup sup 



X ,s,T,- 



^-^ n 



< a.e. 



□ 



This completes the proof of theorem |21 



5 Discussions 

In this paper we consider the strong consistency of MLE for mixtures of location-scale 
distributions. We treat the case that the scale parameters of the component distributions 
are restricted from below by c„ = exp(— n'^), < c? < 1, and give the regularity conditions 
for the strong consistency of MLE. 

As in the case of the uniform mixture in Tanaka and Takemura it is readily 



verified that if c„ decreases to zero faster than exp(— n), then the consistency of MLE 
fails. Therefore the rate of c„ = exp{—n'^), < d < 1, obtained in this paper is almost 
the lower bound of the order of Cn which maintains the strong consistency. 

Although we treat the univariate case in this paper, it is clear that the result obtained 
in this paper can be extended to the multivariate case under the condition that compo- 
nents are bounded and their tails decrease to zero fast enough if the minimum singular 
values of the scale matrices of the components are restricted from below by c„. 

Finally let us consider some sufficient conditions for the regularity conditions. For 
6m £ and any positive real number p, let 

fmix;9m,p) = sup fm{x;e'^). 

dist{e'^,em)<p 

Let r be any compact subset of Qm- Consider the following two conditions. 
Assumption 5. For each 6^ and sufficiently small p, fm{x]Om,p) "is measurable. 

Assumption 6. For each 6m G F, if lim j^oo 6m = 6m, then \\m.j^oo fm{x\6m) = 
fm{x; 6m) for all x. 

If assumptions and IHl hold, then it is easily verified that assumptions El and El hold. 
Thus assumptions |TJ (H and El are sufficient conditions for regularity conditions and 
assumptions El and El are checked more easily. For example, finite mixture density which 
consists of normal density, t-density and uniform density on an open interval satisfies 
assumptions [TJ (H El and El 
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Table 1: List of notations {9 E Q; JT C {1, . . . , M}; V CR; y,p,fi,w E M) 



Notation 


Definition or description 


M 


Number of components 


9x 


Subvector of G consisting of the components in 




kdx = jc'^ G Hj ; Parameter space of t^,x ; "^ee (j2|) 




lr{x;9.j(r) = Y.k&.j^(^kfk{x;pk,c^k) ; See © 


f.je{x; 0,x,p) 


f.^{x]Ojf,p) = supdist((?v,ejr)<p/-^(^5^.V) ; See dHj) 




= {/^(x; ^,^) 1 G e,^} ; See © 




^^ = U|^l<^^;r ; See © 




Hm = 0, cxm = 1) < min{fo , f i • x ; See Assumption [T] 




Cn = Co ■ exp{—n'^) ; See tlieorem|2l 


B 


B = Vo/kq ; See (123) 


Pi ^\y) 


p = —, ^[y) = y ' 


J{6) 


t//i\ II r /\ /T^^^^ 
-'(^) = \Jm&.X'„<,,}^^rn " Z^(cTm),/im + ^KCTm)) ] ScC ([IDI) 


■m 


Interval of step function; See lemma El 


H{jm) 


Height of step function in Jt{0) ', See lemma El 


w{Jt{e)) 


Width of Jt{e); See lemma El 




Number of steps ; See lemma El 




V2 ^2h^y {vo{M + l)f , ay) ^V2-(^Y ; See m 


Uq, Ui 


fix;Oo) < minjuo, ui ■ \x\~^} ; See lemma 111 




Xn,i = min {xi, . . . , x„}, Xn,n = max {xi, . . . , Xn} ; See lemma El 




■i+C _ 




min^e^^i^^{min{|/i^^) + z/(a^))|,|/i(^)-z/((T^))|}} ; See ^ 


< 


s^o = (-CX), -Aq] U [Aq, oo) ; See (jUj) 


'•^ 

"^crj,0) ^u\oai ^\\i\\'X) 


Disjoint subset of =Sf; See lemma [71 


'^u<cqi •^a'>Bi '^\^\>Ao 


Disjoint subset of {1, . . . , M}; See ^ 




J(fR = ^\{J(faio U '-^a^oD U .^\fj,\ioo} subsectlou 14.21 
= {1, . . . , M}\{J^<co U .^>_B U .^^|>Ao} ill subsection 14.11 


PoiV) 


Po{V) = Jyf{x;eo)dx;Seem 


RniV) 


Number of observations which belong to a set V 




Open ball with center 6 and radius p{6) 




< = Co ■ exp (-72^/^) ; See (gH) 




See (|421) 




B„ = G B 1 dm s.t. c„ < (Tm < Co or > A^] 




See ((2HI) 




0n„^,s = {0^ ©n,^ 1 e =^(^V.>P(^yj); See (Ei 




0n,^,s,T,r = e e:,,,^,, 1 T(^) = T , Tn{e) = t} ; See m 




p'i{9) denote the middle point of Jt{d) 




See (gHI) 




See dSSI) 


Rn[p,w] 


= -w,p + w]); See (gHD 



